NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Search Result Diversification Using Query Aspects as Bottlenecks

https://doi.org/10.1145/3583780.3615050

Yu, Puxuan; Rahimi, Razieh; Huang, Zhiqi; Allan, James (October 2023, ACM)

We address some of the limitations of coverage-based search result diversification models, which often consist of separate components and rely on external systems for query aspects. To overcome these challenges, we introduce an end-to-end learning framework called DUB. Our approach preserves the intrinsic interpretability of coverage-based methods while enhancing diversification performance. Drawing inspiration from the information bottleneck method, we propose an aspect extractor that generates query aspect embeddings optimized as information bottlenecks for the task of diversified document re-ranking. Experimental results demonstrate that DUB outperforms state-of-the-art diversification models.
more » « less
Full Text Available
AutoName: A Corpus-Based Set Naming Framework

https://doi.org/10.1145/3404835.3463100

Huang, Zhiqi; Rahimi, Razieh; Yu, Puxuan; Shang, Jingbo; Allan, James (July 2021, Proceedings of The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 21))
null (Ed.)
Inferring the set name of semantically grouped entities is useful in many tasks related to natural language processing and information retrieval. Previous studies mainly draw names from knowledge bases to ensure high quality, but that limits the candidate scope. We propose an unsupervised framework, AutoName, that exploits large-scale text corpora to name a set of query entities. Specifically, it first extracts hypernym phrases as candidate names from query-related documents via probing a pre-trained language model. A hierarchical density-based clustering is then applied to form potential concepts for these candidate names. Finally, AutoName ranks candidates and picks the top one as the set name based on constituents of the phrase and the semantic similarity of their concepts. We also contribute a new benchmark dataset for this task, consisting of 130 entity sets with name labels. Experimental results show that AutoName generates coherent and meaningful set names and significantly outperforms all compared methods. Further analyses show that AutoName is able to offer explanations for extracted names using the sentences most relevant to the corresponding concept.
more » « less
Full Text Available
Learning to Rank Entities for Set Expansion from Unstructured Data

https://doi.org/10.1145/3409256.3409811

Yu, Puxuan; Rahimi, Razieh; Huang, Zhiqi; Allan, James (September 2020, Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2020))
null (Ed.)
Entity set expansion (ESE) refers to mining ``siblings'' of some user-provided seed entities from unstructured data. It has drawn increasing attention in the IR and NLP communities for its various applications. To the best of our knowledge, there has not been any work towards a supervised neural model for entity set expansion from unstructured data. We suspect that the main reason is the lack of massive annotated entity sets. In order to solve this problem, we propose and implement a toolkit called {DBpedia-Sets}, which automatically extracts entity sets from any plain text collection and can provide a large number of distant supervision data for neural model training. We propose a two-channel neural re-ranking model {NESE} that jointly learns exact and semantic matching of entity contexts. The former accepts entity-context co-occurrence information and the latter learns a non-linear transformer from generally pre-trained embeddings to ESE-task specific embeddings for entities. Experiments on real datasets of different scales from different domains show that {NESE} outperforms state-of-the-art approaches in terms of precision and MAP, where the improvements are statistically significant and are higher when the given corpus is larger.
more » « less
Full Text Available
Hide-n-Seek: An Intent-aware Privacy Protection Plugin for Personalized Web Search

https://doi.org/10.1145/3209978.3210180

Yu, Puxuan; Ahmad, Wasi Uddin; Wang, Hongning (July 2018, SIGIR '18 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval)

We develop Hide-n-Seek, an intent-aware privacy protection plugin for personalized web search. In addition to users' genuine search queries, Hide-n-Seek submits k cover queries and corresponding clicks to an external search engine to disguise a user's search intent grounded and reinforced in a search session by mimicking the true query sequence. The cover queries are synthesized and randomly sampled from a topic hierarchy, where each node represents a coherent search topic estimated by both n-gram and neural language models constructed over crawled web documents. Hide-n-Seek also personalizes the returned search results by re-ranking them based on the genuine user profile developed and maintained on the client side. With a variety of graphical user interfaces, we present the topic-based query obfuscation mechanism to the end users for them to digest how their search privacy is protected.
more » « less
Full Text Available

Search for: All records